Multiclass Prediction with Partial Least Square Regression for Gene Expression Data: Applications in Breast Cancer Intrinsic Taxonomy

نویسندگان

  • Chi-Cheng Huang
  • Shih-Hsin Tu
  • Ching-Shui Huang
  • Heng-Hui Lien
  • Liang-Chuan Lai
  • Eric Y. Chuang
چکیده

Multiclass prediction remains an obstacle for high-throughput data analysis such as microarray gene expression profiles. Despite recent advancements in machine learning and bioinformatics, most classification tools were limited to the applications of binary responses. Our aim was to apply partial least square (PLS) regression for breast cancer intrinsic taxonomy, of which five distinct molecular subtypes were identified. The PAM50 signature genes were used as predictive variables in PLS analysis, and the latent gene component scores were used in binary logistic regression for each molecular subtype. The 139 prototypical arrays for PAM50 development were used as training dataset, and three independent microarray studies with Han Chinese origin were used for independent validation (n = 535). The agreement between PAM50 centroid-based single sample prediction (SSP) and PLS-regression was excellent (weighted Kappa: 0.988) within the training samples, but deteriorated substantially in independent samples, which could attribute to much more unclassified samples by PLS-regression. If these unclassified samples were removed, the agreement between PAM50 SSP and PLS-regression improved enormously (weighted Kappa: 0.829 as opposed to 0.541 when unclassified samples were analyzed). Our study ascertained the feasibility of PLS-regression in multi-class prediction, and distinct clinical presentations and prognostic discrepancies were observed across breast cancer molecular subtypes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bioinformatics-Based Prediction of FUT8 as a Therapeutic Target in Estrogen Receptor-Positive Breast Cancer

Abstract Introduction: Estrogen receptor-positive (ER-positive) breast cancer is a subgroup of breast tumors that is more likely to respond to hormone therapy. ER-positive and ER- negative breast cancers tend to show different patterns of metastasis because of different signaling cascade and genes that are activated by estrogen response. Genetic factors can contribute to high rates of metastas...

متن کامل

Bioinformatics-Based Prediction of FUT8 as a Therapeutic Target in Estrogen Receptor-Positive Breast Cancer

Abstract Introduction: Estrogen receptor-positive (ER-positive) breast cancer is a subgroup of breast tumors that is more likely to respond to hormone therapy. ER-positive and ER- negative breast cancers tend to show different patterns of metastasis because of different signaling cascade and genes that are activated by estrogen response. Genetic factors can contribute to high rates of metastas...

متن کامل

Classification of proteomic data with multiclass Logistic Partial Least Squares algorithm

Early detection of cancer is crucial for successful treatments. In this paper, we propose a multiclass Logistic Partial Least Squares (LPLS) algorithm for classification of normal vs. cancer using Mass Spectrometry (MS). LPLS combines the multiclass logistic regression with Partial Least Squares (PLS) algorithm. Wavelet decomposition is also proposed for pre-processing of original data. Wavelet...

متن کامل

Improving biological activity prediction of protein kinase inhibitors using artificial neural network and partial least square methods

Introduction: Protein kinase causes many diseases, including cancer; therefore, inhibiting them plays an important role in the treatment of many diseases. Traditional discovery inhibitors of this enzyme is a time-consuming and costly process. Finding a reliable computer-aided drug discovery tools which can detect the inhibitors will reduce the cost. In this study, it is attempted to separate ki...

متن کامل

Identification of Prognostic Genes in Her2-enriched Breast Cancer by Gene Co-Expression Net-work Analysis

Introduction: HER2-enriched subtype of breast cancer has a worse prognosis than luminal subtypes. Recently, the discovery of targeted therapies in other groups of breast cancer has increased patient survival. The aim of this study was to identify genes that affect the overall survival of this group of patients based on a systems biology approach. Methods: Gene expression data and clinical infor...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 2013  شماره 

صفحات  -

تاریخ انتشار 2013